--- layout: page title: Exercise 2 permalink: /scripts/exercise2/ parent: R Scripts nav_order: 3 ---
Researchers designed a field experiment in rural Kenya to investigate why the poor are constrained in their ability to save money. In this experiment, researchers randomly varied individuals’ access to technology that would enable greater security of investment. By observing the impact of these technologies on the amount of money saved, researchers were able to identify key barriers to saving.
This exercise is based on: Dupas, Pascaline and Jonathan Robinson. 2013. “Why Don’t the Poor Save More? Evidence from Health Savings Experiments.” American Economic Review, 103(4): 1138-1171, http://dx.doi.org/10.1257/aer.103.4.1138.
They worked with 113 ROSCAs (Rotating Savings and Credit
Associations). A ROSCA is a group of individuals who come together and
make regular cyclical contributions to a fund (called the “pot”), which
is then given as a lump sum to one member in each cycle. In their
experiment, Dupas and Robinson randomly assigned 113 ROSCAs to one of
five study arms. In this exercise, we will focus on three study arms
(one control and two treatment arms). The data file,
rosca.csv is extracted from their original data, excluding
individuals who have received multiple treatments for the sake of
simplicity.
Individuals in all study arms were encouraged to save for investments in preventative health products and were asked to set a health goal for themselves at the beginning of the study. They were also assigned randomly to a treatment condition:
• In the first treatment group (Safe Box), respondents were given a box locked with a padlock, and the key to the padlock was provided to the participants. They were asked to record what health product they were saving for and its cost. This treatment is designed to estimate the effect of having a safe and designated storage technology for preventative health savings.
• In the second treatment group (Locked Box), respondents were given a locked box, but not the key to the padlock. The respondents were instructed to call the program officer once they had reached their saving goal, and the program officer would then meet the participant and open the Locked Box at the shop where the product is purchased.
• The point here is that compared to the safe box, the locked box offered a stronger commitment through earmarking (the money saved could only be used for the pre-specified purpose).
Participants are interviewed again 6 months and 12 months later. In
this exercise, our outcome of interest is the amount (in Kenyan
shilling) spent on preventative health products after 12 months:
fol2_amtinvest.
| Name | Description |
|---|---|
bg_female |
1 if female, and 0 otherwise |
bg_married |
1 if married, and 0 otherwise |
bg_b1_age |
age at baseline |
encouragement |
1 if encouragement only (control group), and 0 otherwise |
safe_box |
1 if safe box treatment, and 0 otherwise |
locked_box |
1 if lock box treatment, and 0 otherwise |
fol2_amtinvest |
Amount invested in health products |
has_followup2 |
1 if appears in 2nd followup (after 12 months), and 0 otherwise |
As with any new dataset, it is important to first get acquainted with its structure and its key variables.
• Load the data set.
• How many participants are there in total?
• What is the percentage of male and female participants?
• How many participants are married?
• What is the mean age of participants?
Hint: Use read.csv() to load file and describe dataset
using nrow() or dim(). For the rest of
questions try out different approaches. Use summary() on
the dataset; use mean() on variables; can
table() help?
• Create a single factor variable treatment that takes
the value control if participants received only
encouragement, safebox if received a safe box, and
lockbox if they received a locked box.
• How many individuals are in the control group? How many individuals are in each of the treatment arms?
• Finally use the table() command on the new
variable.
Hint: Create a new variable in the dataset called
treatment that takes values control,
safebox and lockbox, depending on whether
encouragement, safe_box and locked_box have
the value of 1 respectively in the original data. There are two ways of
doing it, as we did above:
• Creating a null treatment variable, and inputting
control, safebox, and lockbox
values through indexing;
• Using a nested ifelse() command.
• Try both approaches and compare your results.
• Subset the data so that it contains only participants who were interviewed 12 months later during the second followup. Use this subset for the subsequent analyses.
• How many participants are left in each treatment group of this subset?
• Calculate drop-out rates for each group. Does the drop-out rate differ across the treatment conditions?
• What does this result suggest about the internal and external validity of this study?
Hint: Use the subset() function to create a new dataset
and use the table() command on your treatment variable.
table() can also be used to calculate the rates of
saving.